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AGP/DDR INTERFACES FOR FULL SWING AND REDUCED SWING 
(SSTL) SIGNALS ON AN INTEGRATED CIRCUIT CHIP 



Inventors: Nalini Ranjan and Xiaoyi Guo 
5 FIELD OF THE INVENTION 

K 

The present invention relates generally to interfaces for integrated circuits 
and, more particularly, to input/ output (I/O) interfaces that can be configured 
q on-the-fly to comply with multiple protocols and signal specifications. 

ilO BACKGROUND OF THE INVENTION 

Since the advent of integrated circuits, I/O interfaces have been used for 

I5 the purpose of interfacing internal (i.e., on-chip) circuits to external (i.e., off-chip) 

D 

circuits. 1/ O interfaces typically are designed using asynchronous circuits which 
15 operate without a clock. As the frequency of signals through the interface 
increases, however, it becomes more difficult to capture and transmit signals 
using asynchronous circuits. In an asynchronous circuit, signals ripple through 
the system, set and reset flip-flops, and produce an output at some unpredictable 
future time dependent on system propagation delays. Because signals can 
20 happen at any time, asynchronous circuits are prone to being upset by noise in 
the system. For example, a noise burst on a signal line between clock pulses 
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could cause a number of flip-flops to change state and cause a system 
malfunction. 

In contrast, synchronous circuits, such as edge-triggered "D" flip-flops, can 
be used to reliably capture and transmit signals on either the positive-edge or the 
negative-edge of a clock pulse or strobe. A noise burst on a signal line between 
clock pulses does not typically upset a synchronous circuit. While synchronous 
circuits have been included in I/O interfaces for testing purposes (e.g., scan 
testing), they are not typically used to capture data signals communicated 
between core logic and pads in an integrated circuit chip. To reliably capture 
such data signals, synchronous circuits must have clocks that abide by a variety 
of constraints including skew, duty cycle, and setup/ hold times. If these clock 
parameters are violated, the synchronous circuits could malfunction (e.g., clock 
race, latch-up) resulting in erroneous data signals being captured and 
transmitted. To reduce the probability of synchronous circuits malfunctioning, 
such circuits can be designed using custom physical layouts. 

Custom physical layouts place physical placement control and constraints 
on the components of the I/O interface so as to restrict the variability of critical 
parameters, thereby ensuring reliable high frequency operation. By designing 
components to have a tight relationship to each other, any uncertainty in the 
operation and/ or compensation of such components can be minimized. For 
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example, clock and data paths can be accurately matched as well as designed to 
compensate for simultaneous switching push-out. 

In addition to reliability concerns, it is also desirable that I/O interfaces be 
configurable on-the-fly to comply with multiple protocols and signal 
5 specifications including: Accelerated Graphics Port (AGP), Double Data Rate 
(DDR), Peripheral Component Interconnect (PCI), Stob Serial Terminated Logic 
(SSTL), and Transistor- to-Transistor Logic (TTL). Such I/O interfaces provide 
additional flexibility to system designers. 

p 

]rf Accordingly, there is a need for reliable and flexible I/O interfaces for 

SIIO buffering and conditioning data signals between core logic and pads in 

b? 

as 
t 

integrated circuit chips. It is desirable that these I/O interfaces be configurable 
on-the-fly to comply with multiple protocols and signal specifications. Such I/O 
interfaces should have custom physical layouts of circuitry, power, and clock 
bussing to eliminate problems associated with, for example, uneven layout 
15 traces. 
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SUMMARY OF THE INVENTION 

The present invention is directed to reliable I/O interfaces for an 
integrated circuit chip that can be configured on-the-fly to comply with a 
plurality of protocols and signal specifications. 

An I/O interface of the present invention preferably includes latches, 
clocks, and conditioning circuits implemented in a custom physical layout to 
produce reliable and flexible interfaces to high frequency busses running a 
plurality of protocols and signal specifications. Three clock trees are used to 
synchronize the buffering and conditioning of input/ output signals before 
sending such signals to a pad or core. The clock trees are implemented via 
custom layouts to allow tight control of clock/ strobe parameters (e.g., skew, duty 
cycle, rise/ fall times). 

Two of the clock trees are local to the I/O interface and trigger a plurality 
of output latches configured on-the-fly to buffer output data signals from the 
core in either asynchronous or synchronous format. In the synchronous mode, a 
clock/ strobe could be either edge-centered or window-strobe with respect to the 
data. The third clock tree distributes clock/ strobes from an external source and 
is used to trigger a plurality of input latches configured on-the-fly to buffer input 
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data from the pad in either a window-strobe mode or an edge-centered mode. 
The I/O interface also includes conditioning circuits that condition the I/O 
signals to be compliant with AGP/ DDR protocols, as well as, full swing, reduced 
swing (SSTL), and TTL specifications. 
5 The present invention places signals on the external bus at either the 

positive-edge or negative-edge of external clock/ strobe pulses (i.e., single data 
rate) or both edges of external clock/ strobe pulses using either internal 
clocks/ strobes that are operating at twice the frequency (e.g., double data rate) or 

* internal clocks/ strobes that are operating at the same frequency as the external 

Iho clocks. 

fj In one embodiment of the present invention, an I/O interface for an 

3 integrated circuit chip includes a core, a pad, and a data buffer disposed between 
the core and the pad. The data buffer comprises an output circuit and an input 
circuit. The output circuit includes a plurality of output latches, two output clock 
15 trees, and a first signal conditioning circuit. Each output latch has an input 
coupled to receive output data signals from the core. Each output clock tree is 
coupled to at least one of the output latches for triggering the latching of the 
output data signals from the core. The first signal conditioning circuit is coupled 
to the output latches for conditioning the output data signals so that the output 
20 data signals are compliant with at least one of a plurality of protocols. 
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The input circuit includes a plurality of input latches, an input clock tree, 
and a second signal conditioning circuit. Each input latch has an input coupled 
to receive input data signals from the pad. The input clock is coupled to at least 
one of the input latches for triggering the latching of the input data signals from 
the pad. The second signal conditioning circuit is coupled to the input latches for 
conditioning the input data signals so that the input data signals are compliant 
with at least one of the plurality of protocols. 

In another embodiment of the present invention, an I/O interface for an 
integrated circuit chip includes a core, a pad, and a clock/ strobe buffer disposed 
between the core and the pad. The clock/ strobe buffer generally includes a 
differential amplifier, a programmable delay module, and a gated buffer. The 
differential amplifier is coupled to the pad for receiving AGP and DDR/SSTL 
signals. The programmable delay module is coupled to the differential amplifier 
for delaying the AGP and DDR/SSTL signals. The gated buffer is coupled to the 
programmable delay module and an input clock tree. The gated buffer is for 
distributing the AGP and DDR/SSTL signals to the I/O interface via the input 
clock tree. 

In another embodiment of the present invention, a clock/ strobe buffer 
includes an output circuit and an input circuit. The output circuit generally 
includes an output latch, a multiplexer, and a first conditioning circuit. The 
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output latch has an input coupled to receive output clock/ strobe signals from the 
core. The output latch also has an input coupled to a first output clock tree for 
triggering the latching of output clock/ strobe signals from the core. The first 
output clock tree is disposed in the I/O interface. The multiplexer has an input 
coupled to a second output clock tree. The multiplexer reduces the skew on the 
output clock/ strobe signals. The second output clock tree is also disposed in the 
I/O interface. The first signal conditioning circuit is coupled to the output latch 
and the multiplexer. The first signal conditioning circuit conditions the output 
clock/ strobe signals so that the output clock/ strobe signals are compliant with at 
least one of a plurality of protocols. 

The input circuit includes a differential amplifier and a gated buffer. The 
differential amplifier has an input coupled to the pad for receiving differential 
clock/ strobe signals complying with one of a plurality of protocols. The gated 
buffer is coupled to the differential amplifier and an input clock tree for 
distributing the clock/ strobe signals via the input clock tree . 

An advantage of the present invention is to provide synchronous data 
capture from the core or pad. Using a clock to synchronize data capture from 
the core or pad increases the reliability of such data capture when operating in 
AGP or DDR/SSTL modes. 
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Another advantage of the present invention is that data can be output to a 
bus at both edges of a external clock without using an internal clock that operates 
at twice the frequency of the external clock. 

Another advantage of the present invention is that physical placement 
control and constraints are placed on each of the components of the I/O 
interface, thereby limiting process variations in components that can adversely 
effect reliable high frequency operation. 

Still another advantage of the present invention is that the components of 
the I/O interface are tightly related, thereby minimizing uncertainty in the 
operation and/ or compensation of each component. For example, clock and data 
paths can be accurately matched while compensating for simultaneous switching 
push-out. 

These and other features, aspects, and advantages of the present invention 
will become better understood with regard to the following description, 
appended claims, and accompanying drawings. 
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BRIEF DESCRIPTION OF THE DRAWINGS 




Figure 1 A is a functional block diagram illustrating one embodiment of an 
I/O interface for an integrated circuit including a data buffer and a clock/ strobe 
buffer in accordance with the present invention. 

Figure IB is a set of timing diagrams illustrating clocks which may be used 
by the data buffer 130 to capture data signals from the core or pad in Figure 1 A. 

Figures 2A and 2B are functional block diagrams of one embodiment of the 
data buffer in Figure 1 A in accordance with the present invention. 

Figure 3 is a functional block diagram of one embodiment of the 
clock/ strobe buffer in Figure 1A including a programmable delay module in 
accordance with the present invention. 

Figures 4A and 4B are functional block diagrams of one embodiment of the 
clock/ strobe buffer in Figure 1 A including a matching delay dummy multiplexer 
in accordance with the present invention. 

Figure 5 is a functional block diagram of one embodiment of the matching 
delay dummy multiplexer in Figure 4 A in accordance with the present invention. 

Figure 6A is a layout representation illustrating one embodiment of a 
bussing/ n-well scheme in accordance with the present invention. 
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Figure 6B is a cross-sectional view of the layout representation in Figure 
6A in accordance with the present invention. 

Figure 7 is a layout representation illustrating several embodiments of an 
I/O assembly for an integrated circuit chip in accordance with the present 
invention. 

Figure 8 is a layout representation illustrating one embodiment of an I/O 
assembly for AGP in accordance with the present invention. 

Figure 9 is a layout representation illustrating one embodiment of an I/O 
assembly for DDR in accordance with the present invention. 
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DETAILED DESCRIPTION OF THE INVENTION 

It is noted that the term protocol, as used herein, includes operation modes 
5 and interface specifications (e.g., signal, electrical). The term signal(s), as used 
herein, includes data signals, clock/ strobe signals, power signals, and control 
signals. The term "on-the-fly," as used herein, includes real-time operation of the 
integrated circuit chip (e.g., during the execution of user applications). The term 

Q 

*Q clock/ strobe, as used herein, includes both clocks and strobes. 
rlO Referring to Figure 1, there is shown a functional block diagram of one 

=P embodiment of an I/O interface 100 for an integrated circuit chip in accordance 
q with the present invention. The I/O interface 100 includes pads 110-1 and 110-2, 
O a core 120, a data buffer 130, and a clock/ strobe buffer 140. 
m The data buffer 130 preferably is part of a custom designed I/O cell 

15 disposed in an I/O ring of the integrated circuit chip (not shown). While only 
one 1/ O interface 100 is shown in Figure 1, it is noted that multiple I/O cells 
having multiple I/O interfaces can be duplicated and disposed within the I/O 
ring. 

The data buffer 130 preferably is coupled to the pad 110-1, the core 120, 
20 and the clock/ strobe buffer 140, for buffering and conditioning signals 
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communicated between the pad 110-1 and the core 120. The data buffer 130 may 
be configured on-the-fly to condition signals to comply with a particular 
protocol. These protocols preferably include: AGP, DDR, PCI, SSTL, and TTL. 
Alternatively, the data buffer 130 may be configured at a board design stage. 

The clock/ strobe buffer 140 preferably is part of a custom designed I/O 
cell disposed in the I/O ring of the integrated circuit chip. The clock/ strobe 
buffer 140 is coupled to the pad 110-2, the core 120, and the data buffer 130, for 
buffering and conditioning clocks/ strobes used to trigger events in the data 
buffer 130. The clock/strobe buffer 140 may be configured on-the-fly to 
condition internal and external clocks/ strobes for use by the data buffer 130. 

The data buffer 130 and the clock/ strobe buffer 140 are preferably 
implemented using 0.25u CMOS technology as described in further detail below. 
Some of the benefits of implementing the data buffer 130 using 0.25u CMOS 
technology include: (a) the ability to group together I/O pads of the same bus to 
control the skews in the group, (b) the ability to control layout traces for 
clock/ strobe signals, (c) the ability to shield the signals Vbias and Vref (described 
below), (d) the ability to control the power/ ground supply to minimize delay, 
skew, and imbalance, and (e) the ability to configure the data buffer 130 and the 
clock/ strobe buffer to meet different operation modes and different interface 
specifications. 
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Referring to Figure IB, there is shown a set of timing diagrams illustrating 
clocks which may be used by the data buffer 130 to capture data signals from the 
core or pad in Figure 1 A. 

A first timing diagram illustrates a single rate clock 150 (hereinafter also 
referred to as "CCK_1X"). The single rate clock 150 has a period T and may be 
used to synchronize events in the data buffer 130 at a frequency of 66 MHz 
(PCI/TTL or "IX" AGP) or 150 MHz (DDR/SSTL). The positive-edge 180 
and/ or the negative-edge 190 of a pulse of the single rate clock 150 may be used 
to synchronize events. 

A second timing diagram illustrates a double rate clock 160 (hereinafter 
also referred to as "DCK_2X"). The double rate clock 160 has a period T/2 and 
may be used to synchronize events in the data buffer 130 at a frequency of 133 
MHz ("2X" AGP). The double rate clock 160 provides two positive-edges for 
every pulse of the single rate clock 150. 

A third timing diagram illustrates a delayed clock 170. The delayed clock 
170 is delayed by T/4 and is another embodiment of the double rate clock 160. 
The combination of the single rate clock 150 and the delayed clock 170 provides 
two positive-edges that may be used to synchronize events in the data buffer 130 
at a frequency of 300 MHz. The combination of clock 150 and clock 170 enables 
data signals to be output to a bus at twice the frequency of the single rate clock 

13 
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150 without using an internal clock that operates at twice the frequency of the 
external clock. 

Referring to Figures 2A and 2B, there is shown a functional block diagram 
of one embodiment of the data buffer 130 in accordance with the present 
5 invention. The data buffer 130 includes an output circuit 200 (Figure 2A) and an 
input circuit 202 (Figure 2B). The output circuit 200 and the input circuit 202 are 
each coupled to the pad 110-1 and the core 120 and provide the circuitry that 
buffers and conditions signals communicated across the I/O interface 100. The 
output circuit 200 is coupled to the input circuit 202 at letter designations A and 

iio B. 

f. The output circuit 200 includes a data capture circuit 204 and a signal 

□ conditioning circuit 206. The data capture circuit 204 includes output clock tree 

g 

208 (hereinafter also referred to as "CCK_1X"), output clock tree 210 (hereinafter 
also referred to as "DCK_2X") / latches 212-1 through 212-3, and buffer gates 214- 
15 1 through 214-5. The latches 212-1 through 212-3 are, for example, edge- 
triggered "D" flip-flops. Each of the latches 212-1 through 212-3 are reset by a 
reset line 216 provided by the core 120. 

The latch 212-1 has an input 218 and an input 220. The input 218 is 
coupled to receive "even" data (SSTL) or "2X" data (AGP) from the core 120. 
20 The input 220 is coupled to the output clock tree 208 for receiving a clock/ strobe. 

6 
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The output clock tree 208 provides either a data clock/ strobe (SSTL) or a "2X" 66 
MHz clock/ strobe (AGP). The latch 212-1 is further coupled to the buffer gates 
214-1 and 214-2 by lines 222-1 and 222-2, respectively. The buffer gates 214-1 and 
214-2 are coupled to the signal conditioning circuit 206 by a line 224 for further 
processing as described in detail below. 

The latch 212-2 has an input 226 and an input 228. The input 226 is 
coupled to receive "odd" data (SSTL) from the core 120. The input 228 is coupled 
to receive an inverted clock/ strobe from the output clock tree 208. The latch 212- 
2 is further coupled to the buffer gate 214-3 by a line 230. The buffer gate 214-3 is 
coupled to the signal conditioning circuit 206 by the line 224 for further 
processing as described in detail below. 

The latch 212-3 has inputs 232-1 and 234. The input 232-1 is coupled to 
receive "IX" data (AGP) from the core 120. The input 234 is coupled to the 
output clock tree 210 for receiving a clock/strobe. The clock tree 210 provides a 
memory clock/ strobe (SSTL) or a "IX" 66 MHz clock/strobe (AGP). The latch 
212-3 is further coupled to the buffer gate 214-4 by a line 236. Additionally, an 
input 232-2 is coupled directly to the buffer gate 214-5, thereby bypassing the 
latch 212-3. 

The latches 212-1 through 212-3 capture and hold data from the core 120 
until released at the next clock/ strobe. The buffer gates 214-1 and 214-5 provide 
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additional current drive to the signal conditioning circuit 206 to prevent 
overloading of the data capture circuit 204. The output clock trees 208 and 210 
distribute double rate clock/ strobes and single rate clock/ strobes, respectively. 
The double rate clock/ strobe operates at twice the frequency of the single rate 
clock/ strobe (Figure IB). Both the double rate clock/strobe and the single rate 
clock/ strobe are derived from, for example, a phase locked loop (PLL) in the core 



The signal conditioning circuit 206 includes a pre-driver 240, a voltage 
tolerant circuit 242, a driver circuit 244, a switching well 246, and a pull-up/ pull- 
down circuit 248. The pre-driver 240 is coupled to the buffer gates 214-1 through 
214-5 via the line 224. The pre-driver 240 has inputs 250-1, 250-2, and 250-3. The 
input 250-3 is coupled to receive an output enable signal for enabling the pre- 
driver 240. The line 250-1 provides a voltage select signal (" VSEL") and the line 
250-2 provides a drive select signal ("DRIVESELECT") for providing, 
respectively, voltage selection and power configuration at the pad 110-1 in 
accordance with TABLE I below. 



120. 
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TABLE I 



VSEL 


VDD 


VDDD 


VDIO 


DRIVESELECT 


Low 


2.5V 


3.3V 


3.3V 


High/ Low 


High 


2.5V 


3.3V 


2.5V 


High/ Low 


Low 


2.5V 


2.5V 


2.5V 


High/ Low 



Referring to Table I, there is shown voltage select and power 
configurations for the pad 110-1. The signals VSEL and DRIVESELECT can be 
gated together using conventional CMOS logic (e.g., NAND, NOR gates) to select 
a voltage/ power configuration on the pad 110-1. The drive on the pad 110-1 is 
adjusted, for example, by turning on/ off a predetermined number of CMOS 
gates/ buffers (not shown) coupled in parallel. The power supplies Vdio, Vddd, 
and Vdd are described in further detail below in conjunction with Figure 6. It is 
noted that Vdio cannot be higher than Vddd. 

The voltage tolerant circuit 242 is coupled to the pre-driver 240 by the line 
252 for receiving data from the data capture circuit 204. The voltage tolerant 
circuit 242 is also coupled to the switching well 246 by a line 254 and to a leak 
enable signal by a line 256. 

The present invention works with a number of operating voltages (e.g., 
3.3V, 3.3V/2.5V, 5V). The voltage tolerant circuit 242 provides the data buffer 
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130 with 5 volt tolerance when using 3 volt process technology. For example, in 
mixed voltage environments the data buffer 130 may drive an output to 3 volts 
(high) and 0 volts (low). An external source driving the data buffer 130 may, 
however, drive an output to 3 volts (high) or to 5 volts (high). While the 
5 operation of the data buffer 130 is unaffected when the external source switches 
between 0 volts and 3 volts, if the external source goes to 5 volts, potential 
problems of shorting with an internal 3 volt supply or failure of internal devices 
can occur. The voltage tolerant circuit 242 is designed to overcome these 
problems. The voltage tolerant circuit 242 is the subject matter of U.S. Patent 

= ! 

SjLO Application Serial No. 08/801,002, filed on February 19, 1997, entitled "Voltage 
f\ Tolerant Input/ Output Buffer," incorporated herein by reference. 
□ The driver circuit 244 is coupled to the voltage tolerant circuit 242 by a line 

~ 258. The driver circuit 244 has an input 260 for receiving the combined 
£g VSEL/DRIVESELECT signals from the core 120 for adjusting the voltage on the 
15 pad 110-1. The driver circuit 244 has an input 262 for receiving a control voltage 
Vdio. 

The switching well 246 is coupled to the voltage tolerant circuit 242 by the 
line 254 and includes, for example, pMOS logic for switching between the voltage 
on the pad 110-1 and a supply voltage (e.g., Vddd). An output of the switching 
20 well 246 is also coupled to the pad 110-1 and the pull-up/ pull-down circuit 248 
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via a line 249. The switching well 246 works in conjunction with the voltage 
tolerant circuit 242 and the driver circuit 262 for comparing the voltage on the 
pad 110-1 with a supply voltage and switching the higher of these two voltages 
to the switching well 246. The pull-up/ pull-down circuit 248 is coupled to lines 
248-1 and 248-2 for receiving a pull-up signal and a pull-down signal, 
respectively. 

The pull-up/ pull-down circuit 248 includes a pull-up resistor and a pull- 
down resistor (not shown). The pull-up resistor is used to pull-up the voltage on 
the pad 110-1. The pull-down resistor is used pull-down the voltage on the pad 
110-1. The pull-up signal enables/ disables the pull-up resistor. The pull-down 
signal enables/ disables the pull-down transistor. It is noted that these signals 
should be used in conjunction with the output enable signal on line 256 and 
should not be enabled at the same time. 

It is noted that additional signals may be provided to the data buffer 130. 
For example, an AGP select signal can be provided to the data buffer 130 for 
switching between AGP and DDR/SSTL. Also, a bypass signal can be provided 
to the data buffer 130 to bypass the latches 212-1 through 212-3 in the data 
capture circuit 204. The bypass signal can be used, for example, to provide 
backward compatibility with older protocols. The AGP signal and the bypass 
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signal can be implemented with, for example, a combination of CMOS logic gates 
(e.g., NAND, NOR). 

The output circuit 200 described above provides a flexible and reliable I/O 
interface for buffering signals communicated between the core 120 and the pad 
5 110-1 while maintaining on-the-fly compliance with an operative protocol. 

Specifically, the output circuit 200 enables the delivery of signals to external high 
frequency busses in TTL format (PCI), edge-centered format (AGP, DDR), or 
window-strobe mode with precise timing control. It is noted that the output 

sssr. 

M3 clock trees 208 and 210 are designed using custom physical layouts that provide 
"JLO tight control of clock parameters (e.g., skew, duty cycle, rise/fall times). 



* Referring now to Figure 2B, there is shown the input circuit 202 in 

3 accordance with the present invention. The input circuit 202 includes a data 

I capture circuit 203 and a signal conditioning circuit 205. 

I 

I The signal conditioning circuit 205 includes a differential amplifier 264 and 

15 a Schmitt trigger 266. The differential amplifier 264 has inputs 264-1 through 264- 
4. The input 264-1 is for receiving differential input signals (e.g., DDR/SSTL) 
from the pad 110-1. The input 264-2 is coupled to a reference voltage (hereinafter 
also referred to as "Vref") generated off-chip. The input 264-3 is coupled to a bias 
voltage (hereinafter also referred to as " Vbias") generated on-chip. The bias 
20 voltage is set, for example, by a current mirror which generates a current source 
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in the differential amplifier 264. The input 264-4 is coupled to a power down 
signal for powering down the differential amplifier 264. 

The differential amplifier 264 compares the voltage signal on the pad 110-1 
with the reference voltage and generates a logic 1 if the voltage is greater than the 
5 reference voltage and generates a logic 0 if the voltage is less than the reference 
voltage. The peak-to-peak voltage swing for a DDR/SSTL formatted signal is 
typically about 0.8 volts and the reference voltage is typically about 1.5 volts. 

The Schmitt trigger 266 has inputs 266-1 and 266-2. The input 266-1 is for 
5 receiving TTL formatted signals from the pad 110-1. The input 266-2 is coupled 
flO to a power down signal for powering down the Schmitt trigger 266. The Schmitt 
*p trigger 266 provides TTL formatted signals to the core 120 via a line 272. The line 
L 272 can be set to a logic 0 when not receiving TTL formatted signals. 

r =EK? 

P The Schmitt trigger 266 is a conventional circuit that provides, for example, 

fast level transitions by using hysteresis to derive a clean edge from a jittery or 
15 slowly varying waveform at the pad 110-1. 

The data capture circuit 203 includes latches 268-1 through 268-3 and an 
input clock tree 270 (hereinafter also referred to as "ICK"). It is noted that the 
input clock tree 270 provides a buffered clock/ strobe that is generated off-chip 
by an external source. The latch 268-1 is coupled to an output of the differential 
20 amplifier 264 by a line 274 for receiving signals from the differential amplifier 
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264. An input 276 of the latch 268-1 is coupled to the input clock tree 270 for 
receiving a clock/ strobe. Additionally, a line 278 is coupled to an output of the 
latch 268-1 for providing data at the negative edge of the clock/ strobe. 

An input of the latch 268-2 is coupled to the output of the latch 268-1 by a 
line 280. An input 282 of the latch 268-2 is coupled to the clock tree 270 for 
receiving an inverted clock/ strobe. A line 284 is coupled to an output of the latch 
268-2 for providing data at the next positive edge of the clock/ strobe. 

The latch 268-3 is coupled to receive data from the differential amplifier 
264 via a line 286. An input 288 of the latch 268-3 is coupled to the clock tree 270 
for receiving the inverted clock/ strobe. A line 290 coupled to an output of the 
latch 268-3 provides data at the next positive edge of the inverted clock/ strobe. 

The input circuit 202 described above provides a flexible and reliable I/O 
interface for buffering signals communicated between the core 120 and the pad 
110-1 while maintaining on-the-fly compliance with an operative protocol. 
Specifically, the input circuit 202 enables the reliable capture of data received 
either in window-strobe mode (DDR) or edge strobe mode (AGP). Alternatively, 
the input circuit 202 captures data from busses running PCI protocol. It is noted 
that the input clock tree 270 is designed using a custom physical layout that 
provides tight control of clock parameters (e.g., skew, duty cycle, rise/ fall times). 
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Referring to Figure 3, there is shown a functional block diagram of one 
embodiment of the clock/ strobe buffer 140-1 in accordance with the present 
invention. The clock/ strobe buffer 140-1 includes a differential amplifier 300, a 
programmable delay module 302, a gated buffer 304, a Schmitt trigger 306, and a 
5 switch well 308. The clock/strobe buffer 140-1 also includes an output clock tree 
310, an output clock tree 312, and an input clock tree 314. 

The clock/ strobe buffer 140-1 is an input buffer that includes 
programmable circuitry for conditioning (e.g., delaying) external clock/ strobes 
from the pad 110-2 for use by the data buffer 130. This circuitry is preferably 
!: flO implemented using 0.25u CMOS technology as described in further detail below. 

The differential amplifier 300 has inputs 300-1 through 300-4. The input 
300-1 is for receiving differential clock/strobe signals (e.g., DDR/SSTL) from the 
pad 110-2. The input 300-2 is coupled to a reference voltage generated off-chip. 
The input 300-3 is coupled to a bias voltage generated on-chip. The bias voltage 
15 is set, for example, by a current mirror by generating a current source in the 

differential amplifier 300. The input 300-4 is coupled to a power down signal for 
powering down the differential amplifier 300. 

The differential amplifier 300 compares the voltage of a signal on the pad 
110-2 with the reference voltage and generates a logic 1 if the voltage is greater 
20 than the reference voltage and generates a logic 0 if the voltage swing is less than 
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the reference voltage. The peak-to-peak voltage swing is typically about 0.8 volts 
and the reference voltage is typically about 1.5 volts. 

The programmable delay module 302 is coupled to the differential 
amplifier 300 by a line 316 for receiving a clock/strobe. An input 318 of the 
5 programmable delay module 302 is coupled to receive a plurality of program bits 
which are used to program the delay to be added to the clock/ strobe. These bits 
are indicative of a range of programmable delay values where, for example, 0000 
provides the least delay and 1111 provides the most delay. The amount of delay 
y3 to be added is determined by protocol. The delay is added to eliminate skew 

1=3? 
[ft 

J 10 between received data and the clock/ strobe to guarantee reliable data capture. 



The gated buffer 304 is coupled to the programmable delay module 302 by 
q a line 320. The gated buffer 304 is also coupled to the input clock tree 314 which 
provides a buffered version of the clock on pad 110-2. For example, the 
clock/ strobe at the pad 110-2 is gated into a 32-bit buffered clock/ strobe 
15 comprising two 16-bit groups which are distributed to the data buffer 130. The 
gated buffer 304 is further coupled to a line 322 for receiving a clock/ strobe valid 
signal from the core 120 and a line 324 for directly feeding the clock/ strobe on 
the pad 110-2 to the core 120. 

The Schmitt trigger 306 has inputs 306-1 and 306-2 and is coupled to the 
20 gated buffer 304 by a line 326. The input 306-1 is for receiving a TTL formatted 
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clock/strobe from the pad 110-2. The input 306-2 of the Schmitt trigger 306 is 
coupled to a power down signal for powering down the Schmitt trigger 306. 

The Schmitt trigger 306 is a conventional circuit that provides, for example, 
fast level transitions by using hysteresis to derive a clean clock from a jittery or 
5 slowly varying waveform at the pad 110-2. 

The switch well 308 has an input 328 for receiving a leak enable signal for 
enabling a weak leak path while operating in a 5V tolerant mode as described 
previously in conjunction with the signal conditioning circuit 206 (Figure 2A). 
ffl The switch well 308 includes circuitry for providing a mechanism to keep the n- 
Z0 well at a potential which is the higher of either the pad 110-2 or a supply voltage. 
*; Referring to Figures 4A and 4B, there is shown a functional block diagram 

] of one embodiment of the clock/ strobe buffer in Figure 1 A including a matching 

I delay dummy multiplexer in accordance with the present invention. The 

i 

clock/ strobe buffer 140-2 includes an output circuit 400 (Figure 4 A) and an input 
15 circuit 402 (Figure 4B). The output circuit 400 and the input circuit 402 are each 
coupled to the pad 110-2 and the core 120 for buffering and distributing 
clock/ strobes. The output circuit 400 is coupled to the input circuit 402 at letter 
designations A and B. 
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The output circuit 400 includes a latch 402, multiplexers 404-1 and 404-2, a 
pre-driver 406, a voltage tolerant circuit 408, a driver circuit 410, a matching 
delay dummy multiplexer 412, and output clock trees 414 and 416. 

The latch 402 has an input 418 coupled to the core 120 for receiving either a 
ground supply (hereinafter also referred to as "Vss"), an output signal (TTL), or a 
data/clock output clock ("IX" SSTL). The latch 402 has an input 420 coupled to 
the output clock tree 414 for receiving an inverted delayed clock. The delayed 
clock is derived from, for example, a PLL in the core 120. The delayed clock is 
described in further detail below. 

The multiplexer 404-1 is coupled to the latch 402 by a line 422 and coupled 
to the matching delay dummy multiplexer by a line 424. The latch 402 holds a 
signal on line 422 until the next clock/ strobe. The multiplexer 404-1 also has an 
input 426 for receiving a select AGP signal. For example, if the select AGP signal 
is a logic 1, then AGP is enabled. Similarly, if the select AGP signal is a logic 0, 
then SSTL is enabled. A line 425 is coupled to the latch 402 and is used to reset 
the latch 402. 

The multiplexer 404-2 is coupled to the multiplexer 404-1 by a line 428. 
The multiplexer 404-2 also has an input 430 and an input 432. The input 430 is 
for receiving a bypass signal to enable a bypass mode. The input 432 is coupled 
to line 418 for receiving the clock/ strobe output from the core 120. 
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The pre-driver 406 is coupled to the multiplexer 404-2 by a line 434. The 
pre-driver 406 has an input 436 and an input 438. The input 436 is for receiving 
an output enable signal for enabling the output of the voltage tolerant circuit 408. 
The input 438 is for receiving a voltage selection signal for controlling the output 
5 voltage on the pad 110-2. 

The voltage tolerant circuit 408 is coupled to the pre-driver by a line 440. 
The voltage tolerant circuit has an input 442 for receiving a leak enable signal for 
enabling the weak leak path when operating in a 5V tolerance mode. The voltage 
0 tolerant circuit 408 is described in further detail in U.S. Patent Application Serial 
jjlO No. 08/801,002, filed on February 19, 1997, entitled "Voltage Tolerant 

3 

F input/ output Buffer/ 7 incorporated herein by reference. 

2 The driver circuit 410 is coupled to the voltage tolerant circuit 408 by a line 

3 444. The driver circuit 410 has inputs 410-1 through 410-3. The input 410-1 is for 
receiving the control voltage Vdio. The input 410-2 is for receiving a pull-up 

15 enable signal for raising the voltage on the pad 110-2 using a pull-up resistor (not 
shown). Similarly, the input 410-3 is for receiving a pull-down enable signal for 
lowering the voltage on the pad 110-2 using a pull-down resistor (not shown). 

The matching delay dummy multiplexer 412 is coupled to the output clock 
tree 416 by a line 421 and to the multiplexer 404-1 by the line 424. The dummy 

20 delay multiplexer 412 receives on the line 421 a single rate clock/ strobe. The 
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clock/ strobe is processed by the dummy delay multiplexer 412 as described 
below in conjunction with Figure 5. 

The output circuit 400 described above provides a flexible and reliable I/O 
interface for buffering clock/ strobes communicated between the core 120 and the 
pad 110-2 while maintaining on-the-fly compliance with an operative protocol. 

Referring to Figure 4B, the input circuit 402 includes a differential 
amplifier 446, a gated buffer 448, a Schmitt trigger 450, buffers 452-1 through 452- 
2, and an input clock tree 454. The input circuit 402 also includes the output 
clocks trees 414 and 416 previously described in conjunction with Figure 4A. 

The differential amplifier 446 has inputs 446-1 through 446-4. The input 
446-1 is coupled to the pad 110-2 for receiving differential clock/strobe signals 
from the pad 110-2. The input 446-2 is for receiving a reference voltage which is 
generated on-chip. The input 446-3 is for receiving a bias voltage which is 
generated off-chip. The input 446-4 is for receiving a power down signal for 
powering down the differential amplifier 446. 

The gated buffer 448 is coupled to the differential amplifier 446 by a line 
456 and provides a buffered version of the clock/ strobe on the pad 110-2. The 
gated buffer 448 is also coupled to a line 458 which provides a feed through 
clock/ strobe to the core 120. The gated buffer also has an input 460 for receiving 
a clock/ strobe valid signal. 
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The buffers 452-1 and 452-2 are coupled to a data/ clock output clock 453 
and a delayed data/ clock output clock 455, respectively, and provide drive to 
both ends of these clocks. The buffer 452-1 also has an output 462 for providing a 
feedback clock to, for example, a PLL disposed in the core 120. The data/ clock 
output clock 453 receives either a "IX" data output clock (AGP), a clock output 
clock ("2X" SSTL), or a ground bus Vss ("IX" SSTL or TTL). The delayed 
data/ clock output clock 455 receives either a "2X" data clock (AGP), a data 
output clock (SSTL), or a ground bus Vss (TTL). 

The Schmitt trigger 450 has an input 466 for receiving a TTL formatted 
clock/ strobe from the pad 110-2. The Schmitt trigger 450 is coupled to the gated 
buffer 448 by a line 464. The Schmitt trigger 450 also has an input 468 for 
receiving a power down signal for powering down the Schmitt trigger 450. 

The Schmitt trigger 450 is a conventional circuit that provides, for example, 
fast level transitions by using hysteresis to derive a clean clock/ strobe from a 
jittery or slowly varying clock/strobe at the pad 110-2. 

The input circuit 402 described above provides a flexible and reliable I/O 
interface for buffering and distributing clock/ strobes while maintaining on-the- 
fly compliance with an operative protocol. It is noted that the clock trees 414, 
416, and 454 are designed using custom physical layouts that provide tight 
control of clock parameters (e.g., skew, duty cycle, rise/fall times). 



29 



CASE 3156 



PATENT 



Referring to Figure 5, there is shown a functional block diagram of one 
embodiment of the matching delay dummy multiplexer 412 in Figure 4A in 
accordance with the present invention. The matching delay dummy multiplexer 
412 includes latches 500-1, 500-2, and a multiplexer 504. 

The latch 500-1 has inputs 506 and 508. The input 506 is for receiving data 
derived from the core 120. The input 508 is coupled to the clock 416 shown in 
Figures 4 A and 4B for receiving a single rate clock/ strobe. 

The latch 500-2 has inputs 510 and 512. The input 510 is for receiving data 
derived from the core 120. The input 512 is coupled to the clock tree 416 shown 
in Figure 4 A and 4B for receiving an inverted single rate clock/ strobe. 

The multiplexer 504 is coupled to the latches 500-1 and 500-2 by lines 514 
and 516, respectively, for receiving latched clocks/ strobes. The multiplexer 504 
is also coupled to the single rate clock/strobe by a line 518. The single rate 
clock/ strobe is used to trigger latches 500-1, 500-2, and the multiplexer 504. The 
multiplexer 504 provides a delayed signal on line 520. 

The matching delay multiplexer 412 essentially mimics delay to minimize 
skew between the strobe derived from the PLL and data at output 520 as shown 
in Figure 5. 

Referring to Figure 6A, there is shown a layout representation illustrating 
one embodiment of a bussing/ n-well scheme in accordance with the present 
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invention. The scheme includes six power busses and three ground busses 
disposed on at least one of a plurality of metal layers (hereinafter also referred to 
as M2 through Ms), six n-wells, and three clock busses CCK_1X, DCK_2X, and 
ICK. The clock busses CCKJLX, DCK_2X, and ICK correspond to the clock trees 
210, 208, and 270, respectively, shown in Figures 2A and 2B. The n-wells 
preferably are created in a p-type substrate (wafer) using conventional n-well 
CMOS process technology (e.g., diffusion). For clarity, diffusion regions and 
circuitry are not shown in Figures 6A and 6B, It is noted that guard rings may be 
coupled to, for example, the Vddd or Vssd busses to guard against latch-up from 
transient signals on the pad 110-1. Moreover, the data buffer circuitry (not 
shown) is preferably ground on both sides to reduce noise coupling. 

The power busses include positive supplies Vdd, Vddd, Vddw, Vdio, Vref,, 
and Vbias. Vdd is generated off -chip and provides power to circuitry in the core 
120. Vddd is also generated off -chip and provides power to most of the circuitry 
(e.g., thick p-devices) in the I/O ring. Two exceptions are the driver circuit 244 
(Figure 2A) and the driver circuit 410 (Figure 4A) which are powered by Vdio. 
Vdio is electrically isolated from the Vddd so that the voltage on the pads 110-1 
and 110-2 can be adjusted without disturbing the power supply to the other 
circuits in the I/O ring. Vddw is a back-up power bus for coupling switch wells 
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(Figure 2 A) in the I/O ring. Under normal operation, these switch wells are not 
actively coupled to other circuitry in the I/O ring. 

The bussing/ n-well scheme further includes ground busses Vss, Vssub, and 
Vssd. Vss supplies circuitry in the core 120 with a zero voltage reference or 
5 ground. Vssub is coupled to a substrate for providing circuitry with a zero 

voltage reference. Vssd supplies the circuitry in the I/O ring with a zero voltage 
reference. ' 

Vref and Vbias provide a reference voltage and a bias voltage, respectively, 

£3 

£ to the differential amplifiers 264 (Figure 2B), 300 (Figure 3), and 446 (Figure 4B). 

^{10 Vref is derived off-chip from Vdd. Vbias is generated on-chip. 

o 

£ The power busses just described are each coupled to at least one n-well 

L (e.g., Vdd, Vddd) for supplying power to the circuitry (e.g., p-devices) in the data 

0 buffer 130 and clock/ strobe buffers 140-1 and 140-2. The power busses are 

CS 

s S coupled to the n-wells through one or more metal layers, M2 through Ms, which 
15 are made of, for example, aluminum. The metal layers are electrically coupled 
together using conventional techniques (e.g., vias, poly silicon layers). 

For example, the bus Vddd is disposed on metal layers M4 and Ms which 
are coupled to a Vddd n-well with, for example, metal interconnects. Similarly, 
the bus Vdio is disposed on metal layers M2 through Ms. The clock busses 
20 CCK_1X and DCK_2X are each disposed on metal layers M4 and Ms. 

32 




CASE 3156 



PATENT 



Referring to Figure 6B, there is shown a cross-sectional view of the layout 
representation in Figure 6 A in accordance with the present invention. The cross- 
section includes a p-type substrate with n-wells diffused therein. The pads 110-1 
and 110-2 are coupled to the metal layers M2-M5for providing power to the 
power busses Vdd and Vddd. The power busses Vdd and Vddd are further 
coupled to the n-wells for providing power to p-devices disposed thereon. 

Referring to Figure 7, there is shown a layout representation illustrating 
several embodiments of an I/O assembly 700 for an integrated circuit chip in 
accordance with the present invention. The I/O assembly 700 includes breaker 
cells 702, data cells 704, and clock trees ICK, DCK_2X, and CCKJLX, as 
previously described above in conjunction with Figures 2-7. The data cells 704 
each include an embodiment of the data buffer 130 (not shown), previously 
described in conjunction with Figures 2A and 2B. It is noted that the vertical 
dashed lines in Figure 7 indicate that additional data cells 704 may be added to 
the I/O assembly 700. 

The breaker cells 702 are disposed on the top and the bottom of the data 
cells 704 as shown in Figure 7. The breaker cells 702 isolate the clock trees 
CCK.1X, DCK_2X, and ICK, so that different protocols can be executed in 
synchronous mode. For example, the breaker cell 702-1 and the breaker cell 702- 
2 isolate the clock tree CCK_1X wire segments from the adjacent CCK_1X wire 
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segments (702-1) and the CCK_1X wire segments connected to Vss (702-2). It is 
noted that all three clock trees are disconnected from the data cells 704 while 
operating in asynchronous mode. 

In one embodiment of the I/O assembly 700, an data cell 704-1 is coupled 
to clock trees ICK, DCK_2X, and CCK_1X, for capturing and buffering data 
signals from the core 120 or pad 110-1 as described in conjunction with Figures 
2 A and 2B. In another embodiment of the I/O assembly 700, a data cell 704-2 is 
coupled to output clock tree DCK_2X for capturing output data signals from the 
core 120 at a double clock rate as described in conjunction with Figures 2A and 
2B. In yet another embodiment of the I/O assembly 700, an data cell 704-3 is 
coupled to output clock trees CCK_1X and DCK_2X for capturing output data 
signals from the core 120 at either a single clock rate or a double clock rate as 
described in conjunction with Figures 2 A and 2B. In still another embodiment of 
the I/O assembly 700, an data cell 704-4 is coupled to the input clock tree ICK for 
capturing input data signals from the pad 110-1 as described in conjunction with 
Figures 2A and 2B. 

Referring to Figure 8, there is shown a layout representation illustrating 
one embodiment of an I/O assembly 800 for AGP in accordance with the present 
invention. The I/O assembly 800 includes breaker cells 802, data cells 804, and 
clock trees ICK, DCK_2X, and CCK_1X, as previously described above in 
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conjunction with Figures 2-7. The data cells 804 include embodiments of the data 
buffer 130 (not shown) previously described in conjunction with Figures 2A and 
2B. As already noted, the vertical dashed lines indicate that additional data cells 
804 may be added to the I/O assembly 800. 

The breaker cells 802 are for isolating the clock trees CCK_1X, DCK_2X, 
and ICK, so that different protocols can be executed in synchronous mode. It is 
noted that all three clock trees are disengaged when operating in asynchronous 
mode. The data cells are coupled to clock trees ICK, DCK_2X, and CCKJLX, for 
capturing and buffering data signals from the core 120 or pad 110-1 using AGP 
*:10 protocol, as described in conjunction with Figures 2 A and 2B. 
*p Referring to Figure 9, there is shown a layout representation illustrating 

q one embodiment of an I/O assembly 900 for DDR in accordance with the present 
invention. The I/O assembly 900 includes breaker cells 902, data cells 904, and 
clock trees ICK, DCK_2X, and CCK_1X, as previously described above in 
15 conjunction with Figures 2-7. The data cells 904 include embodiments of the data 
buffer 130 (not shown) previously described in conjunction with Figures 2A and 
2B. As already noted, the vertical dashed lines indicate that additional data cells 
904 may be added to the I/O assembly 900. 

The breaker cells 902 are for isolating the clock trees CCK_1X, DCK_2X, 
20 and ICK, so that different protocols can be executed in synchronous mode. 
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Again, it is noted that all three clock trees are disengaged when operating in 
asynchronous mode. The data cells are coupled to output clock trees DCK_2X 
and CCK_1X, for capturing and buffering data signals from the core 120 using 
DDR protocol, as described in conjunction with Figures 2A and 2B. 

Although the present invention has been described in considerable detail 
with reference to certain preferred embodiments thereof, other embodiments are 
possible. For example, it is possible that additional protocols and signal 
specifications may be developed that are applicable to the present invention. 
Similarly, it is possible that alternative custom layouts and process technology 
may be used to implement the present invention. Therefore, the spirit and scope 
of the appended claims should not be limited to the description of the preferred 
embodiments herein. 
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